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Abstract 

We study how risk-sensitive players act in situations where the out¬ 
come is influenced not only by the state-action profile but also by the 
distribution of it. In such interactive decision-making problems, the clas¬ 
sical mean-field game framework does not apply. We depart from most of 
the mean-field games literature by presuming that a decision-maker may 
include its own-state distribution in its decision. This leads to the class 
of mean-field-type games. In mean-field-type situations, a single decision¬ 
maker may have a big impact on the mean-field terms for which new type 
of optimality equations are derived. We establish a finite dimensional 
stochastic maximum principle for mean-fleld-type games where the drift 
functions have a p-norm structure which weaken the classical Lipschitz 
and differentiability assumptions. Sufficient optimality equations are es¬ 
tablished via Dynamic Programming Principle but in infinite dimension. 
Using de Finetti-Hewitt-Savage theorem, we show that a propagation of 
chaos property with virtual particles holds for the non-linear McKean- 
Vlasov dynamilcs. 


1 Introduction 

Recently, there has been a renewed interest in optimization and game problems 
of mean-field type, where the performance functionals, drifts, diffusions, and 
jump coefficients depend not only on the state and the control but also on the 
probability distribution of state-control pair. Most formulations of mean-held 
type optimization in [niniiiiiaiiiiiT] have been of risk-neutral type where the 
performance functionals are the expected values of stage-additive cost functions 
of Bolza or Mayer type. Not all behavior, however, can be captured by risk- 
neutral mean-held type optimizations. One way of capturing risk-averse and 
risk-seeking behaviors is by exponentiating the performance functional before 
expectation (see M)- The objective of a risk-sensitive player is then to optimize 
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an exponentiated long-term loss. The risk-sensitive criterion is related to the 
robust control via relative entropic measures. As the risk-sensitive parameters 
vanish, one gets a risk-neutral maximum principle of mean-field-type. 

1.1 On Mean-Field Games 

There are several pioneer works on static and/or stationary mean-field games. 
Most of them are under different names such as global games, anonymous games, 
aggregative games, population games, large games, etc, but share lot of com¬ 
mon features. Here we limit ourselves to the some pioneer works on dynamic 
mean-held games. One of the hrst works on mean-held games is [I]. Therein, 
the author proposes a game-theoretic model that explains why smaller hrms 
grow faster and are more likely to fail than larger hrms in large economies. The 
game is played over a discrete time space. The mean-held is the aggregate de¬ 
mand/supply which generates a price dynamics. The price moves forwardly, and 
the players react to the price and generate a demand and the hrm a supply with 
associated cost, which regenerates the next price and so on. The author intro¬ 
duced a system of backward-forward system to hnd equilibria (see for example 
Section 4, equation D.l and D.2 in my The backward equilibrium equation 
is obtained as an optimality to the individual response, i.e., the value function 
associated with the best response to price, and the forward equation for the 
evolution of price. Therein, the consistency check is about the mean-held of 
equilibrium actions (population or mass of actions), that is, the equilibrium 
price solves a hxed-point system: the price regenerated after the reaction of 
the players through their individual best-responses should be consistent with 
the price they responded to. Following that analogy, a more general frame¬ 
work was developed in [2], where the mean-held equilibrium is introduced in 
the content of Markovian dynamic games with large number of decision-makers. 
A mean-held equilibrium is dehned in page 4 of [2] by two conditions: (i) each 
generic player’s action is best-response to the mean-held, and (ii) the mean-held 
is consistent and is exactly reproduced from the reactions of the players. This 
matching argument was widely used in the literature as it can be interpreted as 
a generic player reacting to an evolving mean-held object and at the same time 
the mean-held is formed from the contributions of all the players. The authors 
of |39] show that show how common noise can be introduced into the mean-held 
game model (so the mean-held distribution evolves stochastically) and extend 
the Jovanovic-Rosenthal existence theorem. The methodology developed in [1] 
and the subsequent series of papers [a [391 uni Eu Ha l44] share the following 
assumptions: 

• (Big size) There is a large number of decision-makers, sometimes, inhnite, 
or a continuum of decision-makers. 

• (Anonymity) The index of the decision-maker does not affect the utility. 

• (NonAtomicity) A single decision-maker has a negligible effect on the 
mean-held-term and on the utility. 
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Unfortunately, some of the above conditions appear to be very restrictive in 
terms of applications, and we explain below how to relax them via mean-field- 
type game theory. 

1.2 Related works on Mean-Field-Type Game Theory 

One decision-maker 

A stochastic maximum principle (SMP) for the risk-sensitive optimal control 
problems for Markov diffusion processes with an exponential-of-integral perfor¬ 
mance functional was elegantly derived in m using the relationship between 
the SMP and the Dynamic Programming Principle (DPP) which expresses the 
first order adjoint process as the gradient of the value-function of the under¬ 
lying control problem. This relationship holds only when the value-function is 
smooth (see Assumption (B4) in [H]). The approach of m was widely used 
and extended to jump processes in [19] and |20| [18] , but still under this smooth¬ 
ness assumption. However, in many cases of interest, the value function is, in 
the best case, only continuous. Moreover, the relationship between the SMP 
and the DPP is unclear for non-Markovian dynamics and for mean-field type 
game problems where the Bellman optimality principle need to be extended. 
This calls for the need to find a risk-sensitive SMP and DPP for these cases. 
Djehiche et al. (2014, [22l|4]) have established a stochastic maximum principle 
for risk-sensitive mean-field-type control where the key mean-field term is the 
mean state. This means that the drift, diffusion, running cost and terminal cost 
functions depend on the state, the control and on the mean of state. Our work 
extends the results of m to risk-sensitive control problems for dynamics that 
are non-Markovian and of mean-field type. One important contribution of [22] 
is that the derivation of the SMP does not require any (explicit) relationship 
between the first-order adjoint process and a value-function of an underlying 
control problem. Using the SMP derived in m, the approach is easily ex¬ 
tended to the case where the mean-field coupling is in terms of the mean of 
the state and the control processes. In [45], we have extended the methodol¬ 
ogy to risk-sensitive mean-field-type control under partial observation which has 
interesting applications in risk-sensitive filtering problems including mean-field 
ensemble Kalman filtering, state tracking and other data assimilation algorithms 
in geosciences. 

Two or more decision-makers 

The first paper that deals with risk-sensitive games in a mean-field context is 
m- Therein, we have derived a verification theorem for a risk-sensitive mean- 
field game whose underlying dynamics is a Markov diffusion, using a matching 
argument between a system of Hamilton-Jacobi-Bellman (HJB) equations and 
the Fokker-Planck equation. This matching argument freezes the mean-field 
coupling in the dynamics, which yields a risk-sensitive HJB equation for the 
value-function. The mean-field coupling is then retrieved through the Fokker- 
Planck equation satisfied by the marginal law of the optimal state. The work in 
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m is fundamentally different than the present work. Therein, the mean-field 
term is frozen to be the equilibrium mean-field term and a single decision-maker 
cannot influence the mean-field-term. In the present work, we shall show that, 
when a single decision-maker has a non-negligible effect in the mean-field, the 
fundamental optimality equations are changed. In |24| we have analyzed risk- 
sensitive linear-exponentiated quadratic games of mean-field-type for which we 
have provided closed-form expressions using a novel risk-sensitive stochastic 
maximum principle derived in |22j which does not use the value function. It 
allows us, in particular, to work with the SMP equations in situations where 
the value function is not necessarily differentiable. 

Substantial progress have been done in the last decade in mean-field games 
in the non-cooperative setup. However, very little is known about cooperative 
mean-field games. In [36] we have introduced cooperative mean-field type games 
in which the state dynamics and the payoffs depend not only on the state and 
actions but also on their probability measure. We establish a time-dependent 
payoff allocation procedure for coalitions of mean-field type. The allocated pay¬ 
off considers not only fairness property but also the cost of making the coalition. 
Both time consistency and subgame perfectness solution concept equations are 
established. 

1.3 Mean-field-type games: additional featnres 

Risk-sensitive mean-field-type games |42] are fundamentally different than risk- 
sensitive mean-field games. In the mean-field game-theoretic models it is usually 
assumed that (i) very large number of players, (ii) players are indistinguishability 
(in the sense of the strategies, payoffs, state laws), (iii) individual contribution 
to the mean-field term is assumed to be negligible. In mean-field-type games, 
none of the assumptions (i)-(iii) is needed. Following [36] . a mean-field-type 
game is defined as any game in which the payoff and/or state dynamics involve 
not only the state and action profiles and also the distribution of the sate-action 
pair (or its marginals such as distribution of states and distribution of actions). 
Mean-field-type game theory is suitable for one, two or more players. A typical 
example is, a single decision-maker with mean-variance payoff. In mean-field- 
type games: (i) a single player can have a big influence on the mean-field term. A 
typical example is an Air Conditioning system which tries to reduce the variance 
of the temperature state with the respect the desired comfort temperature of the 
user. That the context, there is only one decision-maker, the user, who acts on 
the controller. The control variable is between {Heating, Cooling, Nothing}. 
Clearly, the control action has significant impact on the variance of the temper¬ 
ature. (ii) there is no need for players to be indistinguishable (see Section |3|). 
(iii) there is no need to have large number (or infinite or continuum) of players. 
The mean-field-type game framework allows us to address more interesting real- 
world applications where the number of decision-makers may be large but still 
finite and include both von Neumann and non-von Neumann utility functions. 
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1.4 Novelty and Contribution 

Our contribution can be summarized as follows. We start with one player risk- 
sensitive mean-field-type optimization where the state dynamics has norm 
structure, which is not differentiable. Our main motivation for considering this 
structure comes from its applications for the control of virus spread among in¬ 
teractive communities (networks) as observed in [25) . This allows us to consider 
other types of non-linear mean-field interactions that are not investigated in 
the literature of mean-field games. It also allow us to consider weakened Lips- 
chitz conditions and non-differentiable drift coefficients. We show that the non- 
differentiability issue can be handled using weak derivatives or sub-differential 
set. We derive a stochastic maximum principle and a dual game variable which 
satisfies the risk-sensitive SMP whenever the associates weak derivatives make 
sense. In addition, a risk-sensitive DPP is provided in infinite dimension. We 
believe that the present paper is the first work that analyzes risk-sensitive mean- 
held-type games with the norm which is non-differentiable. 

1.5 Structure of the paper 

The paper is organized as follows. In Section [5J we present the model and state 
the main results for one player. Section |3] presents risk-sensitive mean-held type 
games with two or more players. We provide a dynamic programming principle 
in inhnite dimension in subSection 13.21 Section |4] focuses on the control of 
virus spread among interactive communities (networks). Section [5] concludes 
the paper. For completeness, we provide in Appendix the de Finetti-Hewitt- 
Savage theorem and the existence and uniqueness proofs. 

To streamline the presentation, we only consider the one-dimensional state 
case. The extension to the multidimensional case is by now straightforward. The 
norm is denoted with the index a > 1 and p will be used for the adjoint process 
in the stochastic maximum principle. Also, it should be noted that our diffusion 
coefficient is control independent. More general state, control and mean-held 
dependent diffusions are carried out in |22) . Also the technique developed here 
can be easily extended to the jump-diffusion case using the works in [iniiniin]. 

2 Mean-field-type game with one risk-sensitive 
decision-maker 

Let r > 0 be a hxed time horizon, a > 1 and (D, F,P) be a given hltered 
probability space on which a one-dimensional standard Brownian motion B = 
{Bs} s>o is given, and the hltration F = {J^s, 0 < s < T} is the natural hltration 
of B augmented by P—null sets of We consider the following risk-sensitive 
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problem : 


, infuJ®(M) (1) 

subject to 

dx'^{t) = 6(., t, a;“(t), to“(<), u{t)) dt + cr(., t, x'^(t))dB(t), 
x^{0)=xo, m“(t) := £(a;“(t)), 

where the state space is A" = R, the term b is distribution-dependent and has 
the special structure 

&=(/ \b\°‘i;t,x'^{t),y,u{t))m^{t,dy)^ , 

i.e., the L“—norm of b with the respect to the measure m^{t, .). 

b{t,x,m,u) : [0,T] x ff x V{X) x U — ^ R, 

t G [0,T], a; G R, m G V{X), u G U. Notice that for a > 1 the drift term b is 
non-linear in the measure m. 

b{., t, X, y, u) : [0, T] x x 17 — ^ R, 

a{.,t,x) : [0,T] x X — ^ R, 

nrX{t) ■= £(x”(<)) := Px'^(t) is the probability law of the random variable a:“(t). 
The parameter 9 is the risk-sensitivity index of the player. The instantaneous 
cost function is 


f{t,x,m,u) : [0,T] X fb X 'P{X) x U —)► R, 
and the terminal cost function is 

h{x,m) : df X V{X) — ^ R. 

The control strategy u is chosen by the decision-maker. An admissible control 
strategy u is an F-adapted and -integrable process with values in a non¬ 
empty subset U of R"^. We denote the set of all admissible strategies of the 
player by U. 

Definition 1. A mean-field-type game is a game in which the payojf and/or 
state dynamics involve not only the state-action profiles and also the distribu¬ 
tion of the sate-action pair (or its marginals such as distribution of states and 
distribution of actions). 

Example 1. Problem m is a mean-field-type game with one decision-maker. 
The optimality equation of m is a nonstandard system from mean-field-type 
optimal control \28f . 
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Given an admissible strategy u G 14 of the decision-maker (player), the state 
equation in m is a measure-dependent stochastic differential equation (SDE) 
with random coefficients. 

In view of o, up to a change of the parameter 6 into —0, the optimization 
of is the same as the following optimization problem 


Any u(-) GU satisfying 

j\u{-))= inf /^u(.)), (3) 

«(')ew 


is called a risk-sensitive optimal strategy. The corresponding state process, so¬ 
lution of the SDE in ([T]), is denoted by x{-) := a;“(-). The mean-field-type opti¬ 
mization problem that we are interested in, is to characterize the pair {x, u) solu¬ 
tion of the problem ([T]). Let 'Lt = Jq f{t,x{t),m'^{t),u{t))dt + h{x{T),m^{T)). 
Then the risk sensitive loss functional is given by 

J^ = ilogJ® = ilog [Ee^'J'-]. 

When the risk-sensitive index 6 is small, the loss functional can be expanded 
as 

Q 

E['Lt] + -var('I'T) + 0(9'^), 

where, var('I' 7 ’) denotes the variance of 'Lj'- If 6* < 0 , the variance of 'i>T, as a 
measure of risk, improves the performance, in which case the optimizer is called 
risk seeker. But, when 0 > 0, the variance of worsens the performance Jg, in 
which case the optimizer is called risk averse. The risk-neutral loss functional 
E['I't] can be seen as a limit of risk-sensitive functional Jq when 6* —>■ 0. This is 
one of the reasons why this criterion attracted lots of attention. The criterion 
has also interesting connections with iLoo~mean-field-type optimization. This 
is easily viewed from the Donsker-Varadhan formula: 



dv I = sup 
fi^'P {Vl) 


(t>d^i- v) 


(4) 


for any measurable bounded function (f> on D, and u a probability measure on 
Q. Moreover, the supremum is uniquely achieved by the imitative Boltzmann- 
Gibbs distribution p,* widely used in distributed strategic learning [26], 

^ dv 

~ 

The function H(.\.) is the relative entropy from D to R given by 
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whenever /r S is absolutely continuous with the respect to otherwise we 

set H{p.\i') = + 00 . The problem is 


inf J^{u) = inf 


sup 

/i£'P{0) 




P) 


(5) 


2.1 Existence of solution to the state equation 

We now focus on the well-posedness of the state dynamics. 

Proposition 1. If the functions b and a are Lipschitz with the respect to (x, y) 
with Lipschitz constant L > 0 and 


j (^J \b{t,0,y,u)\°‘'rn{t,dy) 


\ l/a 

j dt < +00, 


and then, the SDE in m admits a unique strong solution in L°‘. If in addi¬ 
tion, 

J (^J \b(,t,X,y,u)\'^°'m{t,dy)^ dt <-\-oo, 
a.s. then V a > 2, 


sup -/n {E ( sup \xi^n{t) - Xi^riit)]^]}^ <+oo, 

n \t<T J 

where Xi^n{t) a i—th particle state solution of 

I dxfn(t) = (iE”=i “ dt 

I = mo, 

and Xi^n(t) has the law of x"^ ft). 

Proof. See Appendix. □ 

As we provide in Theorem [1] in Appendix, the mean-field convergence of 
the empirical measure 1 (t) i® ^ well-established result under 

de Finetti-Hewitt-Savage theorem. The issue here is to identify the limiting 
measure m with the particularity of the L“—norm structure. We provide an 
example of mean-field-type SDE in cooperative dynamics. 

Example 2 (Effect of mean-held in cooperative dynamics). Consider the mean- 
field stochastic dynamics with drift 


II sin(-a;^(t) -|- xft)) - ysm{x{t) - ?/)||“m(t, dy) 


l/o 




where ^ > 0, and constant diffusion coefficient cr S K. The first term in the drift 
(sin(— +x{t))) is often replaced by a control action u{t) € [—1,1] to get 

dx{f) = adB{t) + 

|y ||■u(^) - ^sin(a;(i) - ?/)||“TO(<,d?/)| dt (7) 

This type of mean-field SDE models has been used to understand muscle con¬ 
traction (see Section 5 in Other similar models have been widely studied 

in chemical kinetics, statistical mechanics and economics to capture cooperative 
behavior of a generic particle, oscillator or an agent. 

Note that the presence of the measures m'^{s), 0 < s < T, in the loss 
function may cause time-inconsistency, in which case the Bellman’s Principle 
with the state x is no longer valid and this motivates the use of the stochastic 
maximum principle (SMP) approach to a get a finite dimensional framework. 
Note, however that, one can apply DPP where the state in infinite dimension 
fi{t, .) as shown in Section [32l(see also [37]). 

2.2 Stochastic Maximum Principle 

We define the risk-neutral Hamiltonian associated with random variables X G 
L“(0, P) as follows, for {p,q) G H x H 

H{t,X,m,u,p,q) := (8) 

b{t, X, m, u)p -\- aft, X){t, X)q — f{t, X, m, u). 

We also introduce the risk-sensitive Hamiltonian: for 0 £ R and (p, q, £) G 
R X R X R, 

H^{t,X,m,u,p,q,£) := bp -\- a{q -\- 9£p) — f. (9) 

We have H = . The sign (—/) is used here for the only purpose of having 

maximum principle instead of minimum principle and does not fundamentally 
change the methodology. Moreover, we denote 

Hi{t) ■= p{t)hk{t) + {q + 9£p)ak{t) - fk{t), (10) 


for k G {a;, m}. 

Note that even if b is differentiable, the drift coefficient b which is 


6“(., t, a:“(t), y, u(t))m“(t, dy) 


'yex 


may not be differentiable at the points where b{.) = 0. Denote by 


9 



if b°‘~^{x,m) > 0. The case where < 0 is handled in a similar way. 

The differentiation with the respect to the measure m is considered in a Gateaux- 
derivative sense as in [55]. 

d f 

lim —b{.,t,x,m + ed) = / bm{-,t, x,m){^) d{d^). 

£->• 0 + ae J 

Example 3. We provide Gateaux differentiation of \\x\\a—based functions: 

• Mean state: Let f{.,t,x,m) = J ym(t,dy). Then, 

d f 

% ^ d{df). 

e-s-o+ de J 

The Gateaux-derivative with the respect to m is fm{-,t,x,m){ff) = Then 
fmi-,t,f,,m){x) = X, dx[fmi-,t,f,,m){x)] = 1. It is therefore clear that 

9x[/m(-,t,C,w)(a;)] = 1^0 = dm[fx]- 

• Square of the mean: 

Let g{.,t,x,m) = ^(f ym{t,dy))‘^. Then, 

d f 

lim —g{.,t,x,m + ed)=m / f d{df). 

£->■ 0 + de J 


= fm. Hence, gm{;t,£,,m){x) = xm, and dxgm{;t, X,ni){x) 
ffl. 

• Second moment: If g{.,t,x,m) = ^ f y‘^rn{t,dy) then, 

d 1 /* 

e-s-o+ de 2 J 


gm{; t, X, m){f) = Hence, gm{-, t, m){x) = ^x"^. dxgmi-, t, X, m){x) = 


• a—th moment: g{.,t,x,m) = J y°^m{t,dy) then, lim£^o+a:, m + 
= / ^“ d{df). Thus, gmi-,t,f,m){x) = and dxgmi-H,X,m){x) = 


• a—norm: g(.,t,x,m) = (J \y\°‘m{t,dy)y^°' = ml/°‘ then. 


lim —g{.,t,x,m + ed) 
e —dc 


1 

a 


ICr m{dO 


ler dm 


Thus, gm{;t,f,m){x) = , and dxgmi-,t, X ,m){x) 
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• L°‘—normed drift: We compute the Gateaux-derivative of the L°‘ — normed 

drifts: bm{-,tjX,m){£f) := ^ changing variables, one hasbm{-,t,f,,m){x) 

differentiate with the respect to x to get: 


dxbm{-,t,£,,m){x) = 


b°‘ \.^t,f,x)by{.,t,£,,x) 

b°‘-^{t,f,m) 


( 11 ) 


E[Ldxbmi;t,X,m){x)] = E[L- 


^(.,t,X,x)by(.,t,X,x)-i 


■= E[L- 


'-{.,t,X,x)by(.,t,X,x)-, 


b‘^-^{t,X,m) J ■ b°‘-^(t,X,m) 

where the notation E denotes the expectation with the respect to the vari¬ 
ables with X which is an copy of X. We now replace the argument x by 


X to get E[Ld,ybmi;t,X,m)iX)] = E[L- 
one gets E[Lby{.,t,X,X)]. 


^(.,t,X,X)by{.,t,X,X) 
{t,X,m) 


Ifa = l, 


We now introduce the first order adjoint processes involved in the risk- 
sensitive SMP. The (risk-sensitive) first order adjoint equation is the following 
backward SDE of mean-held type: 


' dp{t) = - + ^^E[v%t)dMt)]} dt 

-^q{t){-9£{t)dt dBt), 

< dv'^{t)=0e{t)v'^{t)dBt, ( 12 ) 

u^(r) = 0«(r), 

^ p{T) = -h,{T) - ^E[cf\T)d,hmiT)]. 

where, 

Note that the Hamiltonian terms in (fT^ are evaluated at the optimal state 
and optimal control {x{-),u{-)), i.e., Hl{t,x{t),rh{t),u{t),p{t),q{t),£{t)), k G 
{a;, m}. 

Lemma 1 ([9l|8]). Consider the following mean-field backward SDE 


p{t)=p{T) + E[f{s,p{s),q{s),p{s),q{s))] ds 


q{s)dB{s), 

where p(T) is a progressively measurable, square integrable random variable. Let 
f{t,be Lipschitz for all time t G [0,T] and t i—>■ /(t,0, 0,0, 0) be square 
integrable over [0,T]. Then, the mean-field backward SDE has a unique adapted 
solution satisfying 



E 


sup |p(t)P+ f \q{t)\'^dt 
tG[0,T] Jo 


< oo. 


(14) 
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Note that, by choosing f (t,p{t), q{t),p{t), q{t)) = ao(t, .)+ai(t, .)p{t)+a 2 {t, .)q{t) + 
asit, .)p + a 4 {t, .)q{t) where ai(t ,.) are measurable bounded coefficient functions, 
one gets a backward equation in the form of the adjoint equations. 

Proposition 2. If the functions b,cr,f,h, are twice continuously differentiable 
with respect to {x, m) and b, a, f, h and all their first order derivatives with 
respect to {x,m) are continuous in (x,m,u), and bounded then m admits an 
F-adapted solution {p,q,v^,tj such that 


E 


sup |p(t)P+ sup \v^{t)\^ 
tG[0,T] ie[0,T] 


+ 



(|9(0P + Kwn dt 


< oo. 


(15) 


In addition, if b > 0 then 113\} admits a unique F-adapted solution. 

Proof. Under the assumptions of Proposition [21 the processes p, q solve a back¬ 
ward SDE coupled with the process u®. Moreover, these equations can be trans¬ 
formed into linear SDEs of mean-field-type, which involves (p, q, E[p\, E[q\) and 
random coefficients involving (x, U® := ’ 0- We now check that the co¬ 

efficients of the linear SDEs does not blow-up within the horizon [0,T]. Eor the 
functions a, /, h, and their derivatives the boundedness follow from the assump¬ 
tion. However, it is not immediate for the drift coefficient b . We recall that 
bx, bm, bxm are not clearly defined at the point where b{.,t,x, .) is zero. Since 
a > 1, we replace these terms by any representation in the sub-differential set. 
All terms are bounded by M = sup^^jQ yj sup \bx{t, .)|. The process v^{t) is al¬ 
most surely bounded. Then, for each direction chosen in the sub-differential, 
the assumptions of Lemma |T] are fulfilled and hence, the existence of solution to 
the first order risk-sensitive adjoint equations follows. Moreover, if 6 > 0 then 
the denominator does not vanish and bj, and dxbm are (uniquely) well-defined, 
and bounded by M. Using Lemma [T] again we get existence and uniqueness of 
solution. □ 


Note that the boundedness and differentiation conditions can be weaken 
by using the techniques developed in mm- The following Proposition is the 
stochastic maximum principle for Problem O- 

Proposition 3. (Risk-sensitive maximum principle) Let the Assumptions 
of Proposition^^ hold. If {x{-),u{-)) is an optimal solution of the risk-sensitive 
control problem m-m, then there are two pairs of F-adapted processes 
(p,q) that satisfy U2\) - il5\} respectively, such that 

H\t, x{f), m{t),u{f),p{t),q{f), £{f)) 

= maxiL®(t, x{t), fh{t), u,p{f), q{f), £{t)), 

U 

for almost every t £ [0,r] and ¥—almo.st surely. 
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Proof. To prove the SMP, we use a logarithmic transformation and follows sim¬ 


ilar steps as in |22) . 


□ 


Below we provide an explicit representation of the process of the SMP via a 
dual approach and partial differential equations of mean-field type. 

3 Mean-Field-Type Games: two or more risk- 
sensitive players 

We now consider two or more risk-sensitive players. The risk-sensitivity index 
of player i is 9i. The best response to U-i,m is the following problem : 


Y log dt+hi{x'^ (T) 

inf„, 

subject to 

dx'^{t) = &(t, u(t)) dta{t,x'^{t))dB{t), 

x“(0) = xo, 

_ m“(t) :=/:(x“(t)), 


where U-i denotes (ui ,... ,Ui, Ui+i,... ,Un), n > 2. and by abuse of notation, 
u = Note that we cannot impose indistinguishability of the players 

since 6i and the objectives /i, hi may be different across the players. 

3.1 Main Result 

We now present the key results of the paper. The risk-sensitive game with cost 
jf solves a system of risk-sensitive HJB equations 



0 


(17) 


which is a partial differential equation with state /r (in infinite dimension). If we 
denote v*{t,x,z) := Vi^^{t, fi){t,x, z) as a dual function (because of the Gateaux 
derivative with the respect to then (u*,...,z;*) is in finite dimension and 
solves the dual system 



( 18 ) 


i G {1,2,...,n} 
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where Hi is the Hamiltonian, Ui € argminiJ^, w = (x,z), and m{t,x) = 
f /i(t, X, dzi... dzn), and .) solves the Kolmogorov equation in which u is 
replaced by the optimal strategies (ui,U 2 , • ■ •, ««)• Then {p*,q*,r]*, U) solves the 
(risk-sensitive) stochastic maximum principle system given by: 


where, 


dp* = - 




+ i-Pd: + 


dv* = vd*dBi, 

p({T) = 0.e^i[^i(T)+hi(x(T),m(T))]^ 
q, = {-p*l* + 

p*{T) = KdT) + ^^E[4>dT)d,hi,^{T)], 

[_ i € {1,2,...,n} 




(19) 


( 20 ) 


3.2 Dynamic programming for risk-sensitive mean-field- 
type games 

We establish a dynamic programming principle in infinite dimension. We first 
write the objectives as a function of the infinite dimensional state p which 
satisfies the Fokker-Planck-Kolmogorov forward equation 

Pt = -dx\bp] - ddfp) + =■ ^ ( 21 ) 

with the initial distribution p{0,dx,dz) = mo{dx)So{dz). The advantage now 
is that p{.) is a deterministic object the cost can be rewritten in a deterministic 
manner as 

Jf = J p{T,dx,dz) e^d-i+hdx,f.^i,{T,.,di))) 

This a terminal cost in the sense it is evaluated only at p{T, .). Since there is no 
running cost, one can write directly the HJB equation using classical calculus 
of variations for 

V^it, pit, .)) = ini J p{T,dx,dz) 

starting from pit, .) at time t : 


0 = K,,t+inf[(&,K,,^)], 

U 
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where 


= J Vi^f_,{p){x,z)b{x,z) dxdz 


(22) 

- ~J 

f z){dx[bfj] + ^z{f^^J.)} 

(23) 


f ViAF){x,z)^doox{cr^T) dxdz 


(24) 

-I 

2 

bdxVi^f, + fidzVi^f, + YdxxVi^f, 

fi{t, dx, dz) 



As we can see, the required working state for player i is {x,Zi), therefore the 
partial derivatives of Vi with respect to z are only considered for Zi. The risk- 
sensitive HJB minimum principle yields 


0 = V,t + / inf 


-'x T fidzVi^^ 


nK- 


d” 2 ^xx^i,fi 


/i(t, dx, dz), 


(25) 


This is an infinite dimensional PDE on Below we provide a simpler opti¬ 

mality equation (i.e., the state will be in finite dimension) by setting p* = 
and Vi^f^{fi){t,x,z) = v*{t,x,z). Differentiating (1251) with the respect to fj, one 
gets 

0 = dtVi^^{fi){t,x,z) + Hi{fi) (26) 

+ J Hi^^{fi){t,x,z)fi{t,dx,dz) 

where 

2 

-ffz = + y (27) 

where H*{t,x,p*,m) = inf„[&p* + fi]. 

Definition 2. The function v*{t, x, z) := Vi^^{pL){t,x, z) is called Dual Function 
associated with the best response value of player i. 

v*{t,x,z) := Vi^fi{p){t,x, z) solves the PDE (flSl) where the state is now 
reduced to (x, z) which is in finite dimension. 

Below we show that if there exists a dual function (in the sense of weak 
derivatives) then its weak derivatives provide a risk-sensitive SMP. 


3.3 Dual functions associated with the best response val¬ 
ues 

In a risk-neutral setting, Bensoussan et al. have established in [28] a partial 
differential equation as a necessary condition for optimality under smoothness 
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assumption. We apply the methodology to the risk-sensitive case. The basic 
idea consists to write the optimality inequality as jf{ui + edi, U-i, 

Jf{u,m'^) > 0 for the cost functional J®. By introducing the auxiliary state z 
such that dzi = fi{.) dt^ Zi(0) = 0, the risk-sensitive game problem is trans¬ 
formed into mean-field-type game problem without running cost. The terminal 
cost is ^ Since the state is augmented to be (a;, z), the first 

order adjoint process becomes ipii,P 2 i,Qi) and the unmaximized Hamiltonian 
is bpii + fp 2 i + uqi. Since the diffusion does not depend on the control u, the 
derivative of this term with the respect to u can be written as p 2 idu\b^ + ft], 
whenever p 2 i 0. 

Let V* {t, X, z) be the dual function defined above, satisfying (ITOl) where p = 
£(x“(t), z”(<)), the x—marginal is m{t, ■) = f p{t,dz) and 

H*{t,x,p*,m) =inf[6p* + fi]. 

U 

The terminal condition is 

u*(r,x,z) = 


+0i y p{T,dx,dz). 
If {v* )i solves the dual equation (fT5l) then 


lim jf = 
€—^ 0 + dc 


I {t,x)&[0,T]xX 




,[t,x,m,u, —, 


aVi 


2v* 


-) p(t, X, z)dxdt 


and Hi u = 0 for interior optimal control u. Let Vi be a function in the Lebesgue 
space L^{I), with I = [o,6] a compact interval of R, a < b. We say that 
w G L^{I) is a ’’weak derivative” of v if, 


J v(t)ip'(t)dt = — J w{t)ip(t)dt, 


for all infinitely differentiable functions p with p{a) = p{b) = 0. 

Equation (|18l) is an interesting partial differential equation. Indeed, if there 
is a solution v*(t,x,z) to ITSl) that is three times weakly differentiable then 
the partial weak derivatives of v*{t,x,z) solves the risk-sensitive SMP (TT^ . 
Below we identify explicitly the processes solution to the risk-sensitive SMP. 
Let p*(t) = where the derivatives are taken 

in a distribution sense (weak derivative). Then the process p* evaluated at the 
optimal trajectory solves the backward SDE: 


dp* = - 


H* 

l,X 


Vi 


dt 


-[ax{q:+P*l*)+q:i*]dt + q*dB, 


(28) 
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where 


GV- 

Qi = -P*I* + , It = [log V *], 

Vt ■= = d^Vi^^{fi){t,x,z), 

We impose a strong smoothness on v* and show that rj* solves a backward 
SDE similar to the one satisfied by . 

Proposition 4. rj* := v*^{t,x,z) solves the backward SDE: 

dr]* =r]*l*dB, r;*(r) = (29) 

Proof. By Ito’s formula, we have 

dr]* := dv*^{t,x,z) (30) 

= Kzt + + vlzzfi + + avt^^JB. 

From (fTSll it is clear that the partial (weak) derivative of the integral term with 
the respect to z is zero (because it does not depend on z). We differentiate (fTSj) 
to get 


^i,zt 




lzH*{t,x, -^,m) 


+ 7;^‘^''^t,zxx — 0 i 


which means that the drift term is v*^^ + v*^Jj + v *= 0. Thus, 
dr]* = crv*^^dB. Lets compute the diffusion coefficient more explicitly: 


2,2 ^ * 7 * 
= -ir^^i.xz = 'n^h■ 


(31) 


Hence, one gets 

dr]* = r]*l*dB, r]*{T) = o^ee4z,(T)+h,{x{T),miT))] _ 


□ 


Proposition 5. The function ( 77 *,..., 77 *) solves the partial differential equa¬ 
tion: 

0 = vlt + llxb + Vlzfi + \^'^r]*xx (32) 

whenever these derivatives make sense. Moreover rf* has a constant sign and 
has the same sign as 9i. 

Proof. This follows from a weak derivative with the respect to zt in Eq. dUl). □ 

Note that the function m in Proposition [S] is the marginal of /i with the 
respect to x and 77 solves the Fokker-Planck-Kolmogorov forward equation with 
drifts (b, f) and diffusion coefficient (cr, 0 ) : 


]Mt + dx[h]M] + d:,{f]i) - ]^dxx{(T‘^T) = 0 , 


(33) 
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Table 1: Internet attacks over the globe. An increase of around 13.02% 


Online Attacks 

North America 

Europe 


Asia 


Australia 



New York 

617 150 

Germany 

571255 

India 

2 353 001 

Australia 

705594 


Virginia 

452916 

Romania 

226018 

China 

772447 




Illinois 

401766 

Sweden 

138543 

Bangladesh 

106102 




California 

343627 








Texas 

309426 








Ohio 

110137 








Florida 

106779 








fi{0,dx,dz) = Too (da;)(5o(dz). In view of Propositions! {pi,qi,Vi ,£i) = 
solves the risk-sensitive SMP m and 


dp* 


HL 


Vi V^ . 


dt 


-pWt? 




dt + {—p*l* -I- 


CTV, 


'-)dB, 


(34) 


p*{T) = h,MT),miT)) + 

E y,{ii{T)+h{x{T),m{T))) d^h,^^{x{T),m(T)){x{T))\ 
^ei(zi{T)+hi(x(T),m{T))) 

The function v*{t,.) = d^V is not the value function in the sense of Bellman 
because of the presence of the term E[Hi^m] in Eq. (fTSl) . v*{t ,.) is the adjoint 
function (dual function) associated to mean-field-type best-response problem. 
Interestingly, in the mean-field free case, i.e., when hi^m = 0, k m — 0, hi^rn — 0: 
the dual function (nj'(t,.),..., n*(t,.)) coincides with the best-response value 
function of the risk-sensitive game problem with augmented state (x, z). 


4 Virus Spread over an evolving network 

WiFi network security has gained significant attention in research and indus¬ 
trial communities as a result of the global connectivity provided by the Internet. 
This has led to a variety of traditional defense mechanisms ranging from cryp¬ 
tography, firewalls, antivirus software, to intrusion detection systems. Table 1 
displays a sample number of network attacks by major geographic region (State 
or Country) with more than 100 000 attacks. See [35] for more details on real 
time web attack monitoring. 

A virus that spreads through WiFi networks as effectively as a human cold 
moves through cities, airports, public transport areas has been explored re¬ 
cently. The virus can travel between WiFi networks via Access Points (APs) 
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that connect households and businesses to WiFi networks. It can also propa¬ 
gate through femto cell and small cell networks. We denote by x the state of 
the entire network, x could represent the number of access points that can be 
reached with infected relays (hotspots) at a specific period of the day. Since the 
number of access points that are active is highly stochastic and the number of 
nodes in the network is time-varying, a; is a random process. We do not consider 
a mass-action principle because there is no conservation of mass in this case, 
the population itself is random. It is unclear that the random process can be 
driven by Brownian but here we assume a small noise effect for simplicity. Users 
move over several geographical areas and some of them may carry portable wifi 
access points. Thus, the network is mobile and random. Each access point may 
interact with other hotspots in a certain neighborhood of communication and 
then the information/host propagates over multiple hops. In this setting an 
increasing rate is observed within the last decade. To capture this phenomenon, 
we propose an L“—norm drift model for the rate|46). The control parameter ui 
of the attacker may represent for example the rate at which the virus attempts 
transmission over the access points. 

4.1 State dynamics models 

With explosive growth of mobile devices and the internet of things (loT), there 
is an increasing number of vulnerable devices and machines connected to these 
networks and hotspots so that the mass is not conserved. The population has 
a tendency the growth (in expectation). To Illustrate this we consider an infor¬ 
mation propagation model where the state is given by 

dx = 'y{x)dt + adB, 

where 'y{x) = kx{1 — ■^), k,K > 0, a:(0) S (0,iF). Figure [T] represents the 
evolution of the state under different noises. In Figure [1] the parameters are 
K = 10, K = 2, and cr G {0,1} and different noise terms are plotted. Starting 
from a initial state value a;(0) = 0.3, we observe that the population state has 
tendency to move around 2 within the time interval [0,1]. 

4.2 The state needs to be controlled 

Due to the presence of malicious attack in the network, there are lot of security, 
privacy concerns so that the state needs to be controlled [3]. To illustrate the 
model we introduce an attacker and a defender. Each of them has a control 
parameter, and has to make a certain decision on those parameters. 

dx = [ 7 ( 0 ;) -I- ui — U 2 \dt + adB, 

ui is the attacker control strategy and U 2 is the defender control strategy. In 
Figure [2] we represent the case where a significant control effort e = U 2 — U 1 = 0.3 
is injected into the system. We observe that the control affects significantly 
the state dynamics and help towards a certain goal. For a significant effort 
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Figure 1: State dynamics with three different noises. 


No noise 



0 0.1 0.2 0.3 0,4 0.5 0.6 0.7 0.8 0.9 1 

0< time < 1 

Add noise using o W 



Figure 2: Effect of control effort e = U 2 — Mi=0.3in the state. 



-Uncontrolled State Dynamics I 


0.1 0.2 0.3 0,4 0.5 0.6 0.7 0.8 0.9 1 

0< time < 1 

Add noise using o W 
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e = U 2 — ui = 0.3 invested into security, the state of infection can stay below 
the level 2 starting from level 1. This means that control helps to reduce the 
infection rate and improve security. However, the location of mobile devices and 
WiFi hotspots may be important in some cases, specially when local interaction 
and communications arise. In order to capture this phenomenon we introduce a 
non-linear behavior via the geographical location distribution and the intensity 
of interaction at time t as m(t, dy) and we introduce b{t, x, y, ui,U 2 ) = y[ 7 (x) -|- 
ui — U 2 ], and the state dynamics becomes 

dx = | 7 (a;) -I- ui — M 2 I [ / y°‘rn{t, dy)]'^dt -I- adB. 

The variable y can be seen as the intensity of interaction of infected de¬ 
vices/hosts. 


Figure 3: Effect of the mean-field term when the effort ise = it 2 — ui=0.3. 
Open-Loop case. 


Add noise using o W 



As we can see in Figure |31 the mean-field term [J^y°‘m{t^dy)]^ affects sig¬ 
nificantly the state dynamics in a multiplicative manner. The infection time 
increases rapidly with the mean-field term. Figure|3]uses a state feedback strat¬ 
egy in the form of [J^ y°‘m(t, dy)] ^x, with a = 1.2. We observe that the infection 
state is significantly reduced compared to the open-loop case of Figure [H This 
is illustrated in Figure [S] for several initial states. Hence, it is important to 
strategically control the mean-field term so that the infected machines remains 
limited and the damage minimized. In order to do such a minimization we 
introduce below some objective functions. 
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Figure 4: Effect of the mean-field term under state and mean-field feedback 
strategies. 



Figure 5: Feedback strategies help to control and maintain the state below a 
certain range with high probability. 


Infection level 



4.3 Objectives 

In the context of delay/disruption tolerant networks it can be shown that the de¬ 
lay and the probability of receiving the information have a natural risk-sensitive 
structure via Poisson arrival rates. However, the attacker and the network de¬ 
fense may not have the same sensitivity when facing the risk. We denote by 
0i = da the attacker risk-sensitivity index and by 02 = Od the defender risk in- 
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dex. The cost of the attacker is fi{x,m,ui,U 2 ) = and its terminal benefit 
(opposite signed for minimization) is hi = —^x°‘ — ^rria, where TOq, denotes 
the a—th moment of the state process. The goal for the attacker is then to find 
a tradeoff between the attack effort cost — iw? and the benefit —a;“ + —m„. 

2 i- a a “ 


Figure 6 : A typical one step cost function. 
Typical Shape of Cost 



The cost of the defender (could be the system administrator) is decom¬ 
posed into damage cost and security investment loss is /2 = -I- h 2 = 

-I- ^TOq,ci > 0,ci > 0. For simplicity, the drift is chosen as b = || 6 ||a 
where b{t, x, y, Ui,U2) = yi'jix) -|- Mi — M2], 7 ( 0 ;) > 1. The control variables are 
limited to the interval [ 0 , 1 ] at any time and the diffusion coefficient is system- 
size dependent: cr„(t) := where n{t) is a random variable representing the 
(active) system size at t. Since there are multiple defense strategies, here we 
do pull them together in a cooperative manner as an ideal target. However, as 
observed in practice, the defender may not coordinate their defense strategies 
due to non-alignment of objectives and/or professional privacy issues. Figured 
represents a typical instantatenous cost. In Figure [7] we plotted the evolution 
of a typical cost (random) invested into security over time. 

These functions are not bounded, and we cannot use directly the existence 
results established above. However, we provide the optimality equation for the 
interior case and derive a risk-sensitive SMP. 

Hi = ml!°‘[y{x) +ui- U 2 ]pi + fi- 
The attacker’s optimal strategy is 

Ml = 
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Figure 7: Evolution of the cost invested into security. 


Add noise using o W 



where [a]J := niin(l, max(0, a)). The defender’s optimal strategy is 

U2 = [wy“p2]o, 

where pi,p2, solve the risk-sensitive SMP system : 

dpi = + -LE[vidxHi^rn]}dt + qi{- 9 ilidt A- dB). 

Vi 

where Hi^x = {x)pi + fi,x, fi,x = 0,f2,x = x, Hi^rn(.;t,i,m){x) = 

amg-i + Ml - U2]p^, and 

d^xBi^jni^.jt^ X ^ Ul^i^x') — 

2 ;“ —1 j,a 

—^[7(0;) +Ui- U 2 ]p^ H- ^l'ix)pi. 

rria arrioi 

4.4 Backward-Forward System 

{ dpi = -{Hi^x + -^ElvldxHi^^Wdt + qi{-eilidt + dB), 
dp2 = -{H2 ,x + ^E[v^dxH2,m]}dt + q2i-02l2dt + dB), 
dx^{t) = b{., t, x'^{t),irE{t),u{t)) dt + adBit), 
a;“(0) = xq, m^{t) := £(x"(t)), 
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Figure 8: Mean-field over time, the initial and final distribution, and the evolu¬ 
tion of the expected values. 



where vf, h and the terminal conditions solve ([T^ with the optimal strategies 

We investigate (l35ll numerically under stochastic Euler scheme (also called 
Euler-Maruyama scheme). We choose a = 1.2. We set 9i — 0 . 1,02 = 0 . 3 , ci = 
0.8 = Cl. The initial distribution is concentrated around two points: 1 and 2. 
This can be extended to capture a geographical area where the attacks are con¬ 
centrated in two main countries. In FigurelH we plot the state distribution over 
time, the initial and final distribution, and the evolution of the expected values. 
We observe that the distribution moves progressively towards higher states. 

5 Concluding remarks 

In this paper we have studied mean-field-type games with a drift that has 
L“—norm structure. Although this norm is not differentiable, it is possible 
to get existence of solutions. We have established relationship between the risk- 
sensitive SMP and the dual functions. This allow us to verify the risk-sensitive 
SMP equations. When, the drift is mean-field free, we retrieve the classical 
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risk-sensitive equations. The work can be extended in several ways. First, the 
interaction model in the drift b{t, x, y, u) (which is pairwise interaction) can 
be modified to include fc—wise interaction in the form b{t, x,yi,... ,yk,u) with 
the measure ni=i nT'itjdyi). This is, in particular, useful for the control of 
virus spread over network where the interaction involves multiple nodes at a 
time. Second, the explicit solutions or qualitative analysis of SMP need to be 
conducted. Third, when a < 1, we do not have a norm and the triangular 
inequality does not hold. In that case, quasi-norm type of inequalities need to 
established. We leave these open issues for future research. 
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Appendix 

A Mean-field convergence 

The extension of (i) the law of large numbers, (ii) central limit theorem and 
(iii) large deviation principle, from independent random variables to sequences 
of indistinguishable random variables has drawn the attention of a number of 
researchers ever since the appearance in Blum, Chernoff and co-authors |30] . 
Below we present some well-known results and explain how they can be used in 
the McKean-Vlasov context with the norm. 
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A.l Indistinguishability 

The notion of indistinguishability (or exchangeability or interchangeability) is 
introduced in order to discuss the existence of a limiting measure and mean-held 
convergence of the empirical measure of virtual particle states in the framework 
of de Finetti-Hewitt-Savage (JU [321 ISSl 1331133] . 

Let A" be a separable complete and metrizable topological space (Polish 
space). 

Definition 3 (Indistinguishability). A collection {x, x( 2 ), ■ ■ ■ ,X(n)) of X—valued 
random variables/processes, is indistinguishable (or exchangeable) if the joint 
law is invariant by permutation over the index set {1,... ,n}, i.e., for any per¬ 
mutation a over the set {1, 2 ,..., n}, one has 

^{x (^ 1 ) t X (^ 2 ) } • ■ • ;^(n)) '^(^cr(l) j ■ ■ ■ ; ^cr(n) ); (36) 

where C{X) denotes the law of the random variable X. An infinite family of 
random variables/processes {x^^\x^‘^\ ...) is indistinguishable if every finite n, 
the family (x(i),a:( 2 ),... ,X(„)) is indistinguishable. 

This says that the order (position) of the random variable in the family 
does not change the joint distribution. From (1361) we also have that, for any 
measurable operator O, 


C 


^X(i) , X(^2) ; • ■ • ; ^(n): 




(37) 


where we do not permute the last component. 

For indistinguishable random variables/ processes, the convergence of the 
empirical measure m„ := A bas been widely studied. This sits at 

the intersection between group theory and probability theory. The symmetry 
group properties have been used to derive some properties of the distributions 
of the processes. The de Finetti-Hewitt-Savage theory provides the mean-field 
convergence of such a measure-valued process. When studying convergence of 
measures, an important issue is the choice of probability metric. In order to 
measure the gap between two probability measures, we introduce the Wasser- 
stein (Vasershtein) metric (also called Monge-Kantorovich metric) da of order 
a > 1. 


Definition 4 (Wasserstein). 


= inf 



do{x,yT^{dx,dy)-, 


7 e ViX X X), 7^, = fi, jy = v}, 
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where jx denotes the marginal with the respect to the x— component, where do 
is a reference metric on X (such a metric exists because X is assumed to be 
metrizable). 

The famous Kantorovich-Rubinstein 1958 theorem gives a dual representa¬ 
tion of di in terms of a Lipschitz-Bounded metric: 

d{p., v) ■= di{p,, v) = sup |y (j)d{p. - v)] \\(j)\\Lip < 1 

where ||(/)||Lip = ||(/)||oo + sup^.^^ Lipschitz-norm of (f. It can 

be shown that da is a metric (a ’’true” distance in a topological sense), i.e., it 
satisfies the axioms of a metric. For Polish spaces X, the Wasserstein distance 
di is known to metrize the weak topology over X. As stated in Villani’s book 
[35] the Wasserstein distance has the following properties: for any 1 < a < -boo, 


\im da{rnn,rn) = 0 

n 


implies, in particular, that 

• m„ converges to m in distribution (weak convergence of probability mea¬ 
sures) i.e., 

Em„[4'] ■= j (fdmn Em[4>\, 

as n —>■ -boo, for any measurable bounded and Lipschitz functions cj). 


• / do{x,y)mn{dy) < -boo for some x € X. 

Thanks to these nice properties, the Wasserstein distance da is an appropri¬ 
ate candidate for the convergence of the empirical measure in the weak sense. 

Theorem 1 (de Finetti-Hewitt-Savage). Let a:(i),X( 2 )! ■■■, be an indistinguish¬ 
able sequence of X—valued random variables, where X is a Polish space. Then, 
there is a ^{X)—valued random measure m such that 


m = lim — d 

m—J-oo 77, 


i=l 


almost surely, 


where V{X) denotes the space of probability measures on X. Conditioned on m, 
the random variables a:(i),a:( 2 ),. •. are i.i.d with distribution m, that is, for each 
measurable bounded function (j), 

E{cj){x(^i),X(^ 2 ),---,X(k)) I m) 


= J , y^)m{dy^) ... m{dy^). 

In addition, if the moments of are finite then 


^/n 


where Ci > 0 and d = di denotes the Wasserstein metric of order one. 
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Note that the convergence in Theorem[T]is in the weak sense since the Monge- 
Kantorovich distance di metrizes the weak topology. Theoreni[I]has been proved 
by de Finetti (1931, [?T] 1 for infinite binary sequences and has been extended 
by Hewitt and Savage (1955, [32]) to continuous and compact state spaces. A 
simple and elegant proof can be found in Aldous (1985, HD), pp. 18-22, for the 
general state space. The rate of convergence for Monge-Kantorovich distance is 
obtained following the line of the law of large numbers of interacting systems. 
Theorem[T]was initially used for static (time-independent) maps. Then, several 
applications in mathematical physics and biology, with dynamical models came 
into the picture. These are dynamically interacting particles, genes, molecules 
or nodes. Theorem |T] was then extended to the dynamical case in at least two 
ways: (i) path wise (up to a certain time step T), (ii) at each time step t. 

A.2 Large Deviation Principle for 

We say that for any time t, the probability measures (TO„(t))„>o on a topological 
space obeys a Large Deviation Principle with rate functions (/(t,.)) and in the 
scale (a„)„ if (a„)„ is a real-valued sequence satisfying a„ —>■ oo and / is a 
non-negative, lower semicontinuous function such that 

— inf /(t,x) < liminf — \ogmn{t,B) 
x^int{B) ^ Clfi 

< limsup — logm„(t,H)<— inf I{t,x), 

for any measurable set B, whose interior is denoted by int{B) and closure by 
d{B). If the level sets {x : x) < j3} are compact for every /3 < -|-oo, /(t,.) is 

called a good rate function. We introduce as the relative entropy function 

(defined also above) 

H{n\ y) := J log(^) d/i, 

if fi is absolutely continuous with the respect to y and -l-oo otherwise. 

The main advantage of having this type of result is the decay of m„(<, B) 
as n gets large. Basically, when the two limits are identical, m„(t, B) is the 
order of where R = I{t,x) > 0. As a consequence, the weak 

convergence from Theorem [1] and central limit theorems can be derived from 
these inequalities. 

The next result provides a large deviation principle result |35j . 

Theorem 2. Assume that initially m„(0) follows a large deviation principle 
with rate 7(0, m(0)) on the set of probability measures V{X). Then {mn(t, .))te[o,T] 
follows a large deviation principle on the set of cadlag functions from [0,r] to 
V(X) with good rate 

Ilt,mit)) = 

f 7(0, m(0))-|- Jq B( m{s) \ m(0)) if m{s) a.c. 

\ -|-oo otherwise 
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where a.c. means absolutely continuous. 

Definition 5. Consider two processes (a^(t))ie[o,T] (md (y(t))te[o,T] o^nd set 


l/o 


:= inf ■ 


E[ sup do{x(t),y{t)y 

te[o,T] 


C{x) = p, C{y) = i^} . 

We now prove Theorem [T] in several steps: 

A.3 Existence of solution to the state equation 

We start with the existence of solution. To prove existence of a solution with the 
respect to the Wasserstein distance, we adopt a contraction-type of approach. 
Then, we construct a Cauchy sequence (L“ space which is a Polish space). 
Consequently, the solution is almost unique. Consider the SDE given by 


x(t) = x(0) 


cr(., s, x(s))dB(s)+ 


ll>r(;S,x(s),y,u)m(s,dy) 


1 l/a 


ds =: RH{x)[m]. 


Then one gets a fixed-point stochastic equation in m: m = L{x(t)) = L{RH(x)[m]). 

Consider two measures mi{s,dy) and 1712 ( 3 ,dy) such that Dt, 0 ( 1 x 11 , 1710 ) < 
- 1-00 then 

Dt,a (C(RH(x)[mi]),C(RH(x)[m2])) 

< f Ds,a(xni,m 2 ) ds, 

Jo 

for any t € [0,r]. Let c > 0, '!'(/)(t) < c f* f(s) ds, t € [0,T] for some operator 
and M = sup^gjo f(s). 

ft nt ,2 

< c / $(/)(s) ds = c [cMs] ds = c^—M. 

Jo Jo 2 

Then, by induction, $^(/)(t) < M, and 

Dt,o (C>^+ypo),C'^(do)) < (C(do),do) < + 00 , 

where Ct = and where we have used the Lipschitz continuity L of 5 

and the Minkowski inequality for p > 1. By summing up over natural numbers 
k, one gets A.a (do), tJ-'" (do)) < + 00 . Thus, (C^ (do))k>ko, is a 

Cauchy sequence with the respect to the metric Dt^o, for ko > 1. Since we are 
in a complete metric space, this sequence converges to some fixed-point (say, 
to). Then to = C(RH(x)[m\) = L(m) is the unique fixed-point, solution of the 
SDE. 
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A.4 Pathwise mean-field convergence 

Consider n independent random process x* „ satisfying 

rt 


<nW=<«(0)+ [ <^i;S,xl„{s))dB{s) 
Jo 


l^r(-. y, u)m*{s, dy) 


m*{t ,.) = £{x*{t)). and the particle representation 


1 1/q 


ds. 


Jo 


+ 


|6|“(., s, Xi^„(s),y, ■u)m„(s, dy) 


/o Uy 

1 \-^n 


1/0 


ds, 


where mUt ,.) = - 

It suffices to prove the statement for the coefficient with mean-field term. 
We take the difference between the two drift terms. 


D = 


>0 



1 

Ijoc 



ir(- 

>s,xl„{s),y,u)m*{s,dy) 


ds 

(38) 



1/a 



ir(- 

,s,Xi^n{s),y,u)mn{s,dy) 


ds 

(39) 


'V 


We decompose D into three separate terms as follows: 

rt 


D = 


10 


+ 


10 Uy 
rt 


10 Uy 
rt 


lO Uy 
rt 


1 

Ijcx 

{s),y,u)m*{s,dy) 

ds 

- 

1/a 

{s),y,u)m*^{s,dy) 

ds 

- 

1/a 

{s),y,u)m*^{s,dy) 

ds 

- 

1/a 

{s),y,u)mn{s,dy) 

ds 

- 

1/a 

{s),y,u)mn{s,dy) 

ds 

- 

1/a 

{s),y,u)mn{s,dy) 

ds 


(40) 


(41) 


JO lJ y 

D = Ii + I2 + I3 


(42) 

(43) 
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The first term Ii (see (HOI) ') deals only with i.i.d random variables. Therefore, 
the convergence for that part is classical. For the second term I 2 , we use the 
triangular inequality for a > 1. By Lipschitz continuity of 6 , we get I/ 2 I < 

-X fo Er=i l|a;*n('S) - a;i.n(s)|||“]^^“ ds. By Lipschitz continuity of b, I/ 3 I < 

fo Kn(s‘) - Xi,n(s)l ds. 

By Holder inequality 


< 



(44) 


where 


Y — 

^ n,s — 


l^r(-> <n(s)> y, u)m*[s, dy) 


l/a 


l^r(-. <n(s), 2 /, u)ml{s, dy) 


1 l/a 


(45) 


Summing over i and using Gronwall Lemma yields 


{F;[ sup |a;*„(s)-a:i,„(s)|“]}^/“ 
se[o,T] 

< ds. 

Jo 

which provides a convergence rate of y/n since 


sup \/n < E sup 

n I sG[0.T] 




l/a 

< + 00 . 


This result shows that when a > 1, and when the initial distributions of the 
virtual particles are mutually independent, with same distribution as x( 0 ), then 
the particle interaction model with fixed control u soon destroys that indepen¬ 
dence through the empirical measure to„. But, for a given finite time t, when the 
number of particle becomes large, the mean-field convergence implies that the 
distributions become approximately independent again conditioning on m{t ,.), 
so that independence is still retained. This is called propagation-of-chaos. Note 
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that this result is limited to finite horizon. For long-term behavior one needs to 
study the asymptotics (infinite horizon in time) of the SDEs in order to derive 
propagation (or non-propagation) of chaos property. 
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